Heterogeneous sparse matrix–vector multiplication via compressed sparse row format

نویسندگان

چکیده

Sparse matrix–vector multiplication (SpMV) is one of the most important kernels in high-performance computing (HPC), yet SpMV normally suffers from ill performance on many devices. Due to performance, requires special care store and tune for a given device. Moreover, HPC facing heterogeneous hardware containing multiple different compute units, e.g., many-core CPUs GPUs. Therefore, an emerging goal has been produce formats methods that allow critical kernels, SpMV, be executed devices with portable minimal changes format method. This paper presents based CSR, named CSR-k, can tuned quickly outperforms average Intel MKL Xeon Platinum 838 AMD Epyc 7742 while still outperforming NVIDIA’s cuSPARSE Sandia National Laboratories’ KokkosKernels NVIDIA A100 V100 regular sparse matrices, i.e., matrices where number nonzeros per row variance ≤10, such as those commonly generated two three-dimensional finite difference element problems. In particular, CSR-k achieves this reordering by grouping rows into hierarchical structure super-rows super–super-rows are represented just few extra arrays pointers. its simplicity, model device, used select super-row sizes constant time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse matrix multiplication: The distributed block-compressed sparse row library

Efficient parallel multiplication of sparse matrices is key to enabling many large-scale calculations. This article presents the DBCSR (Distributed Block Compressed Sparse Row) library for scalable sparse matrix-matrix multiplication and its use in the CP2K program for linear-scaling quantum-chemical calculations. The library combines several approaches to implement sparse matrix multiplication...

متن کامل

Vectorized Sparse Matrix Multiply for Compressed Row Storage Format

The innovation of this work is a simple vectorizable algorithm for performing sparse matrix vector multiply in compressed sparse row (CSR) storage format. Unlike the vectorizable jagged diagonal format (JAD), this algorithm requires no data rearrangement and can be easily adapted to a sophisticated library framework such as PETSc. Numerical experiments on the Cray X1 show an order of magnitude ...

متن کامل

Deblocking Joint Photographic Experts Group Compressed Images via Self-learning Sparse Representation

JPEG is one of the most widely used image compression method, but it causes annoying blocking artifacts at low bit-rates. Sparse representation is an efficient technique which can solve many inverse problems in image processing applications such as denoising and deblocking. In this paper, a post-processing method is proposed for reducing JPEG blocking effects via sparse representation. In this ...

متن کامل

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using sub-warps of threads to realize an early compression effect...

متن کامل

Block-Row Sparse Matrix-Vector Multiplication on SIMD Machines

The irregular nature of the data structures required to efficiently store arbitrary sparse matrices and the architectural constraints of a SIMD computer make it difficult to design an algorithm that can efficiently multiply an arbitrary sparse matrix by a vector. A new ‘‘block-row’’ algorithm is proposed. It allows the ‘‘regularity’’ of a data structure with a row-major mapping to be varied by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Parallel Computing

سال: 2023

ISSN: ['1872-7336', '0167-8191']

DOI: https://doi.org/10.1016/j.parco.2023.102997